Obtain all the data for the Master students, starting from 2007. Compute how many months it took each master student to complete their master, for those that completed it. Partition the data between male and female students, and compute the average -- is the difference in average statistically significant?

Notice that master students' data is more tricky than the bachelors' one, as there are many missing records in the IS-Academia database. Therefore, try to guess how much time a master student spent at EPFL by at least checking the distance in months between Master semestre 1 and Master semestre 2. If the Mineur field is not empty, the student should also appear registered in Master semestre 3. Last but not the least, don't forget to check if the student has an entry also in the Projet Master tables. Once you can handle well this data, compute the "average stay at EPFL" for master students. Now extract all the students with a Spécialisation and compute the "average stay" per each category of that attribute -- compared to the general average, can you find any specialization for which the difference in average is statistically significant?


In [1]:
# Requests : make http requests to websites
import requests
# BeautifulSoup : parser to manipulate easily html content
from bs4 import BeautifulSoup
# Regular expressions
import re
# Aren't pandas awesome ?
import pandas as pd

Let's get the first page in which we will be able to extract some interesting content !


In [2]:
# Ask for the first page on IS Academia. To see it, just type it on your browser address bar : http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?ww_i_reportModel=133685247
r = requests.get('http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.filter?ww_i_reportModel=133685247')
htmlContent = BeautifulSoup(r.content, 'html.parser')

In [3]:
print(htmlContent.prettify())


<html>
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
   <div>
   </div>
   <title>
   </title>
   <script src="GEDPUBLICREPORTS.txt?ww_x_path=Gestac.Base.Palette_js&amp;ww_c_langue=fr" type="text/javascript">
   </script>
   <link href="GEDPUBLICREPORTS.css?ww_x_path=Gestac.Moniteur.Style" rel="stylesheet" type="text/css">
    <link href="GEDPUBLICREPORTS.css?ww_x_path=Gestac.Moniteur.StyleNavigator" rel="stylesheet" type="text/css"/>
   </link>
  </meta>
 </head>
 <body alink="#666666" bgcolor="#ffffff" link="#666666" marginheight="0" marginwidth="5" vlink="#666666">
  <div class="filtres">
   <form action="!GEDPUBLICREPORTS.filter" method="GET" name="f">
    <input name="ww_b_list" type="hidden" value="1">
     <input name="ww_i_reportmodel" type="hidden" value="133685247">
      <input name="ww_c_langue" type="hidden" value="">
       <h1 id="titre">
        Liste des étudiants inscrits par semestre
       </h1>
       <table border="0" id="format">
        <tr>
         <th>
          Format:
         </th>
        </tr>
        <tr>
         <td>
          <input checked="" name="ww_i_reportModelXsl" type="radio" value="133685270">
           html
          </input>
         </td>
        </tr>
        <tr>
         <td>
          <input name="ww_i_reportModelXsl" type="radio" value="133685271">
           xls
          </input>
         </td>
        </tr>
       </table>
       <h1>
       </h1>
       <table border="0" id="filtre">
        <tr>
         <th>
          Unité académique
         </th>
         <td>
          <input name="zz_x_UNITE_ACAD" type="hidden" value="">
           <select name="ww_x_UNITE_ACAD" onchange="document.f.zz_x_UNITE_ACAD.value=document.f.ww_x_UNITE_ACAD.options[document.f.ww_x_UNITE_ACAD.selectedIndex].text">
            <option value="null">
            </option>
            <option value="942293">
             Architecture
            </option>
            <option value="246696">
             Chimie et génie chimique
            </option>
            <option value="943282">
             Cours de mathématiques spéciales
            </option>
            <option value="637841336">
             EME (EPFL Middle East)
            </option>
            <option value="942623">
             Génie civil
            </option>
            <option value="944263">
             Génie mécanique
            </option>
            <option value="943936">
             Génie électrique et électronique
            </option>
            <option value="2054839157">
             Humanités digitales
            </option>
            <option value="249847">
             Informatique
            </option>
            <option value="120623110">
             Ingénierie financière
            </option>
            <option value="946882">
             Management de la technologie
            </option>
            <option value="944590">
             Mathématiques
            </option>
            <option value="945244">
             Microtechnique
            </option>
            <option value="945571">
             Physique
            </option>
            <option value="944917">
             Science et génie des matériaux
            </option>
            <option value="942953">
             Sciences et ingénierie de l'environnement
            </option>
            <option value="945901">
             Sciences et technologies du vivant
            </option>
            <option value="1574548993">
             Section FCUE
            </option>
            <option value="946228">
             Systèmes de communication
            </option>
           </select>
          </input>
         </td>
        </tr>
        <tr>
         <th>
          Période académique
         </th>
         <td>
          <input name="zz_x_PERIODE_ACAD" type="hidden" value="">
           <select name="ww_x_PERIODE_ACAD" onchange="document.f.zz_x_PERIODE_ACAD.value=document.f.ww_x_PERIODE_ACAD.options[document.f.ww_x_PERIODE_ACAD.selectedIndex].text">
            <option value="null">
            </option>
            <option value="355925344">
             2016-2017
            </option>
            <option value="213638028">
             2015-2016
            </option>
            <option value="213637922">
             2014-2015
            </option>
            <option value="213637754">
             2013-2014
            </option>
            <option value="123456101">
             2012-2013
            </option>
            <option value="123455150">
             2011-2012
            </option>
            <option value="39486325">
             2010-2011
            </option>
            <option value="978195">
             2009-2010
            </option>
            <option value="978187">
             2008-2009
            </option>
            <option value="978181">
             2007-2008
            </option>
           </select>
          </input>
         </td>
        </tr>
        <tr>
         <th>
          Période pédagogique
         </th>
         <td>
          <input name="zz_x_PERIODE_PEDAGO" type="hidden" value="">
           <select name="ww_x_PERIODE_PEDAGO" onchange="document.f.zz_x_PERIODE_PEDAGO.value=document.f.ww_x_PERIODE_PEDAGO.options[document.f.ww_x_PERIODE_PEDAGO.selectedIndex].text">
            <option value="null">
            </option>
            <option value="249108">
             Bachelor semestre 1
            </option>
            <option value="249114">
             Bachelor semestre 2
            </option>
            <option value="942155">
             Bachelor semestre 3
            </option>
            <option value="942163">
             Bachelor semestre 4
            </option>
            <option value="942120">
             Bachelor semestre 5
            </option>
            <option value="2226768">
             Bachelor semestre 5b
            </option>
            <option value="942175">
             Bachelor semestre 6
            </option>
            <option value="2226785">
             Bachelor semestre 6b
            </option>
            <option value="2230106">
             Master semestre 1
            </option>
            <option value="942192">
             Master semestre 2
            </option>
            <option value="2230128">
             Master semestre 3
            </option>
            <option value="2230140">
             Master semestre 4
            </option>
            <option value="2335667">
             Mineur semestre 1
            </option>
            <option value="2335676">
             Mineur semestre 2
            </option>
            <option value="2063602308">
             Mise à niveau
            </option>
            <option value="249127">
             Projet Master automne
            </option>
            <option value="3781783">
             Projet Master printemps
            </option>
            <option value="953159">
             Semestre automne
            </option>
            <option value="2754553">
             Semestre printemps
            </option>
            <option value="953137">
             Stage automne 3ème année
            </option>
            <option value="2226616">
             Stage automne 4ème année
            </option>
            <option value="983606">
             Stage printemps 3ème année
            </option>
            <option value="2226626">
             Stage printemps 4ème année
            </option>
            <option value="2227132">
             Stage printemps master
            </option>
           </select>
          </input>
         </td>
        </tr>
        <tr>
         <th>
          Type de semestre
         </th>
         <td>
          <input name="zz_x_HIVERETE" type="hidden" value="">
           <select name="ww_x_HIVERETE" onchange="document.f.zz_x_HIVERETE.value=document.f.ww_x_HIVERETE.options[document.f.ww_x_HIVERETE.selectedIndex].text">
            <option value="null">
            </option>
            <option value="2936286">
             Semestre d'automne
            </option>
            <option value="2936295">
             Semestre de printemps
            </option>
           </select>
          </input>
         </td>
        </tr>
       </table>
       <input name="dummy" type="submit" value="ok"/>
      </input>
     </input>
    </input>
   </form>
   <script type="text/javascript">
    function loadReport(x) {
    var querysup='';
    writeRunning(top.principal);
    for (i=0; document.f.elements.length > i; i++){
     if (document.f.elements[i].checked){
      querysup=querysup+'&'+document.f.elements[i].name+'='+document.f.elements[i].value;
     }
     if (document.f.elements[i].type=='select-one') {
      querysup=querysup+'&'+document.f.elements[i].name+'='+document.f.elements[i].options[document.f.elements[i].selectedIndex].value;
     }
     if (document.f.elements[i].type=='text'){
      querysup=querysup+'&'+document.f.elements[i].name+'='+document.f.elements[i].value;
     }
    }
    parent.principal.location = "!GEDPUBLICREPORTS.bhtml?"+x+"&ww_i_reportModel=133685247"+querysup;
    if (navigator.userAgent.toUpperCase().indexOf('SAFARI') != -1) {
     document.location.reload();
    }
   }
   </script>
   <h1>
   </h1>
   <table border="0">
   </table>
  </div>
 </body>
</html>
<!-- OpenXml:0.00s  agent ctrl:0.00s  xml:0.52s  xsl clob before parse:0.00s  xsl extr&stylesheet:0.00s  xsl clob before parse:0.00s  xsl after parsing:0.00s  xsl ctrl data:0.00s  transform 2:0.05s  xsl process:0.00s  -->

Now we need to make other requests to IS Academia, which specify every parameter : computer science students, all the years, and all bachelor semester (which are a couple of two values : pedagogic period and semester type). Thus, we're going to get all the parameters we need to make the next request :


In [4]:
# We first get the "Computer science" value
computerScienceField = htmlContent.find('option', text='Informatique')
computerScienceField


Out[4]:
<option value="249847">Informatique</option>

In [5]:
computerScienceValue = computerScienceField.get('value')
computerScienceValue


Out[5]:
'249847'

In [6]:
# Then, we're going to need all the academic years values.
academicYearsField = htmlContent.find('select', attrs={'name':'ww_x_PERIODE_ACAD'})
academicYearsSet = academicYearsField.findAll('option')

# Since there are several years to remember, we're storing all of them in a table to use them later
academicYearValues = []
# We'll put the textual content in a table aswell ("Master semestre 1", "Master semestre 2"...)
academicYearContent = []

for option in academicYearsSet:
    value = option.get('value')
    # However, we don't want any "null" value
    if value != 'null':
        academicYearValues.append(value)
        academicYearContent.append(option.text)

In [7]:
# Now, we have all the academic years that might interest us. We wrangle them a little bit so be able to make request more easily later.
academicYearValues_series = pd.Series(academicYearValues)
academicYearContent_series = pd.Series(academicYearContent)
academicYear_df = pd.concat([academicYearContent_series, academicYearValues_series], axis = 1)
academicYear_df.columns= ['Academic_year', 'Value']
academicYear_df = academicYear_df.sort_values(['Academic_year', 'Value'], ascending=[1, 0])
academicYear_df


Out[7]:
Academic_year Value
9 2007-2008 978181
8 2008-2009 978187
7 2009-2010 978195
6 2010-2011 39486325
5 2011-2012 123455150
4 2012-2013 123456101
3 2013-2014 213637754
2 2014-2015 213637922
1 2015-2016 213638028
0 2016-2017 355925344

In [8]:
# Then, let's get all the pedagogic periods we need. It's a little bit more complicated here because we need to link the pedagogic period with a season (eg : Bachelor 1 is autumn, Bachelor 2 is spring etc.)
# Thus, we need more than the pedagogic values. For doing some tests to associate them with the right season, we need the actual textual value ("Bachelor semestre 1", "Bachelor semestre 2" etc.)
pedagogicPeriodsField = htmlContent.find('select', attrs={'name':'ww_x_PERIODE_PEDAGO'})
pedagogicPeriodsSet = pedagogicPeriodsField.findAll('option')

# Same as above, we'll store the values in a table
pedagogicPeriodValues = []
# We'll put the textual content in a table aswell ("Master semestre 1", "Master semestre 2"...)
pedagogicPeriodContent = []

for option in pedagogicPeriodsSet:
    value = option.get('value')
    if value != 'null':
        pedagogicPeriodValues.append(value)
        pedagogicPeriodContent.append(option.text)

In [9]:
# Let's make the values and content meet each other
pedagogicPeriodContent_series = pd.Series(pedagogicPeriodContent)
pedagogicPeriodValues_series = pd.Series(pedagogicPeriodValues)
pedagogicPeriod_df = pd.concat([pedagogicPeriodContent_series, pedagogicPeriodValues_series], axis = 1);
pedagogicPeriod_df.columns = ['Pedagogic_period', 'Value']

In [10]:
# We keep all semesters related to master students
pedagogicPeriod_df_master = pedagogicPeriod_df[[period.startswith('Master') for period in pedagogicPeriod_df.Pedagogic_period]]
pedagogicPeriod_df_minor = pedagogicPeriod_df[[period.startswith('Mineur') for period in pedagogicPeriod_df.Pedagogic_period]]
pedagogicPeriod_df_project = pedagogicPeriod_df[[period.startswith('Projet Master') for period in pedagogicPeriod_df.Pedagogic_period]]

pedagogicPeriod_df = pd.concat([pedagogicPeriod_df_master, pedagogicPeriod_df_minor, pedagogicPeriod_df_project])
pedagogicPeriod_df


Out[10]:
Pedagogic_period Value
8 Master semestre 1 2230106
9 Master semestre 2 942192
10 Master semestre 3 2230128
11 Master semestre 4 2230140
12 Mineur semestre 1 2335667
13 Mineur semestre 2 2335676
15 Projet Master automne 249127
16 Projet Master printemps 3781783

In [11]:
# Lastly, we need to extract the values associated with autumn and spring semesters.
semesterTypeField = htmlContent.find('select', attrs={'name':'ww_x_HIVERETE'})
semesterTypeSet = semesterTypeField.findAll('option')

# Again, we need to store the values in a table
semesterTypeValues = []
# We'll put the textual content in a table aswell
semesterTypeContent = []

for option in semesterTypeSet:
    value = option.get('value')
    if value != 'null':
        semesterTypeValues.append(value)
        semesterTypeContent.append(option.text)

In [12]:
# Here are the values for autumn and spring semester :

semesterTypeValues_series = pd.Series(semesterTypeValues)
semesterTypeContent_series = pd.Series(semesterTypeContent)
semesterType_df = pd.concat([semesterTypeContent_series, semesterTypeValues_series], axis = 1)
semesterType_df.columns = ['Semester_type', 'Value']
semesterType_df


Out[12]:
Semester_type Value
0 Semestre d'automne 2936286
1 Semestre de printemps 2936295

Now, we got all the information to get all the master students ! Let's make all the requests we need to build our data. We will try to do requests such as :

  • Get students from master semester 1 of 2007-2008
  • ...
  • Get students from master semester 4 of 2007-2008
  • Get students from mineur semester 1 of 2007-2008
  • Get students from mineur semester 2 of 2007-2008
  • Get students from master project semester 1 of 2007-2008
  • Get students from master project semester 2 of 2007-2008

... and so on for each academic year until 2015-2016, the last complete year. We can even take the first semester of 2016-2017 into account, to check if some students we though they finished last year are actually still studying. This can be for different reasons : doing a mineur, a project, repeating a semester...

We can ask for a list of student in two formats : HTML or CSV. We choosed to get them in a HTML format because this is the first time that we wrangle data in HTML format, and that may be really useful to learn in order to work with most of the websites in the future ! The request sent by the browser to IS Academia, to get a list of student in a HTML format, looks like this : http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?arg1=xxx&arg2=yyy With "xxx" the value associated with the argument named "arg1", "yyy" the value associated with the argument named "arg2" etc. It uses to have a lot more arguments. For instance, we tried to send a request as a "human" through our browser and intercepted it with Postman interceptor. We found that the folowing arguments have to be sent : ww_x_GPS = -1 ww_i_reportModel = 133685247 ww_i_reportModelXsl = 133685270 ww_x_UNITE_ACAD = 249847 (which is the value of computer science !) ww_x_PERIODE_ACAD = X (eg : the value corresponding to 2007-2008 would be 978181) ww_x_PERIODE_PEDAGO = Y (eg : 2230106 for Master semestre 1) ww_x_HIVERETE = Z (eg : 2936286 for autumn semester)

The last three values X, Y and Z must be replaced with the ones we extracted previously. For instance, if we want to get students from Master, semester 1 (which is necessarily autumn semester) of 2007-2008, the "GET Request" would be the following :

http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS=-1&ww_i_reportModel=133685247&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=249847&ww_x_PERIODE_ACAD=978181&ww_x_PERIODE_PEDAGO=2230106&ww_x_HIVERETE=2936286

So let's cook all the requests we're going to send !


In [13]:
# Let's put the semester types aside, because we're going to need them
autumn_semester_value = semesterType_df.loc[semesterType_df['Semester_type'] == 'Semestre d\'automne', 'Value']
autumn_semester_value = autumn_semester_value.iloc[0]

spring_semester_value = semesterType_df.loc[semesterType_df['Semester_type'] == 'Semestre de printemps', 'Value']
spring_semester_value = spring_semester_value.iloc[0]

In [14]:
# Here is the list of the GET requests we will sent to IS Academia
requestsToISAcademia = []

# We'll need to associate all the information associated with the requests to help wrangling data later :
academicYearRequests = []
pedagogicPeriodRequests = []
semesterTypeRequests = []

# Go all over the years ('2007-2008', '2008-2009' and so on)
for academicYear_row in academicYear_df.itertuples(index=True, name='Academic_year'):
    
    # The year (eg: '2007-2008')
    academicYear = academicYear_row.Academic_year
    
    # The associated value (eg: '978181')
    academicYear_value = academicYear_row.Value
    
    # We get all the pedagogic periods associated with this academic year
    for pegagogicPeriod_row in pedagogicPeriod_df.itertuples(index=True, name='Pedagogic_period'):
        
        # The period (eg: 'Master semestre 1')
        pedagogicPeriod = pegagogicPeriod_row.Pedagogic_period
        
        
        # The associated value (eg: '2230106')
        pegagogicPeriod_Value = pegagogicPeriod_row.Value
        
        # We need to associate the corresponding semester type (eg: Master semester 1 is autumn, but Master semester 2 will be spring)
        if (pedagogicPeriod.endswith('1') or pedagogicPeriod.endswith('3') or pedagogicPeriod.endswith('automne')):
            semester_Value = autumn_semester_value
            semester = 'Autumn'
        else:
            semester_Value = spring_semester_value
            semester = 'Spring'
        
        
        
        # This print line is only for debugging if you want to check something
        # print("academic year = " + academicYear_value + ", pedagogic value = " + pegagogicPeriod_Value + ", pedagogic period is " + pedagogicPeriod + " (semester type value = " + semester_Value + ")")
        
        # We're ready to cook the request !
        request = 'http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS=-1&ww_i_reportModel=133685247&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=' + computerScienceValue
        request = request + '&ww_x_PERIODE_ACAD=' + academicYear_value
        request = request + '&ww_x_PERIODE_PEDAGO=' + pegagogicPeriod_Value
        request = request + '&ww_x_HIVERETE=' + semester_Value
        
        # Add the newly created request to our wish list...
        requestsToISAcademia.append(request)
        # And we save the corresponding information for each request
        pedagogicPeriodRequests.append(pedagogicPeriod)
        academicYearRequests.append(academicYear)
        semesterTypeRequests.append(semester)

In [15]:
# Here is the list of all the requests we have to send !
# requestsToISAcademia

In [16]:
# Here are the corresponding years for each request
# academicYearRequests

In [17]:
# Same for associated pedagogic periods
# pedagogicPeriodRequests

In [18]:
# Last but not the least, the semester types
# semesterTypeRequests

In [19]:
academicYearRequests_series = pd.Series(academicYearRequests)
pedagogicPeriodRequests_series = pd.Series(pedagogicPeriodRequests)
requestsToISAcademia_series = pd.Series(requestsToISAcademia)

# Let's summarize everything in a dataframe...
requests_df = pd.concat([academicYearRequests_series, pedagogicPeriodRequests_series, requestsToISAcademia_series], axis = 1)
requests_df.columns = ['Academic_year', 'Pedagogic_period', 'Request']

requests_df


Out[19]:
Academic_year Pedagogic_period Request
0 2007-2008 Master semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
1 2007-2008 Master semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
2 2007-2008 Master semestre 3 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
3 2007-2008 Master semestre 4 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
4 2007-2008 Mineur semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
5 2007-2008 Mineur semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
6 2007-2008 Projet Master automne http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
7 2007-2008 Projet Master printemps http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
8 2008-2009 Master semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
9 2008-2009 Master semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
10 2008-2009 Master semestre 3 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
11 2008-2009 Master semestre 4 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
12 2008-2009 Mineur semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
13 2008-2009 Mineur semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
14 2008-2009 Projet Master automne http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
15 2008-2009 Projet Master printemps http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
16 2009-2010 Master semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
17 2009-2010 Master semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
18 2009-2010 Master semestre 3 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
19 2009-2010 Master semestre 4 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
20 2009-2010 Mineur semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
21 2009-2010 Mineur semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
22 2009-2010 Projet Master automne http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
23 2009-2010 Projet Master printemps http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
24 2010-2011 Master semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
25 2010-2011 Master semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
26 2010-2011 Master semestre 3 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
27 2010-2011 Master semestre 4 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
28 2010-2011 Mineur semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
29 2010-2011 Mineur semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
... ... ... ...
50 2013-2014 Master semestre 3 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
51 2013-2014 Master semestre 4 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
52 2013-2014 Mineur semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
53 2013-2014 Mineur semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
54 2013-2014 Projet Master automne http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
55 2013-2014 Projet Master printemps http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
56 2014-2015 Master semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
57 2014-2015 Master semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
58 2014-2015 Master semestre 3 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
59 2014-2015 Master semestre 4 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
60 2014-2015 Mineur semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
61 2014-2015 Mineur semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
62 2014-2015 Projet Master automne http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
63 2014-2015 Projet Master printemps http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
64 2015-2016 Master semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
65 2015-2016 Master semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
66 2015-2016 Master semestre 3 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
67 2015-2016 Master semestre 4 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
68 2015-2016 Mineur semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
69 2015-2016 Mineur semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
70 2015-2016 Projet Master automne http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
71 2015-2016 Projet Master printemps http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
72 2016-2017 Master semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
73 2016-2017 Master semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
74 2016-2017 Master semestre 3 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
75 2016-2017 Master semestre 4 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
76 2016-2017 Mineur semestre 1 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
77 2016-2017 Mineur semestre 2 http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
78 2016-2017 Projet Master automne http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...
79 2016-2017 Projet Master printemps http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICRE...

80 rows × 3 columns


In [ ]:

The requests are now ready to be sent to IS Academia. Let's try it out !

TIME OUT : We stopped right here for our homework. What is below should look like the beginning of a loop that gets students lists from IS Academia. It's not finished at all :(


In [20]:
# WARNING : NEXT LINE IS COMMENTED FOR DEBGUGGING THE FIRST REQUEST ONLY. UNCOMMENT IT AND INDENT THE CODE CORRECTLY TO MAKE ALL THE REQUESTS

#for request in requestsToISAcademia: # LINE TO UNCOMMENT TO SEND ALL REQUESTS
request = requestsToISAcademia[0] # LINE TO COMMENT TO SEND ALL REQUESTS
print(request)

# Send the request to IS Academia
r = requests.get(request)

# Here is the HTML content of IS Academia's response
htmlContent = BeautifulSoup(r.content, 'html.parser')

# Let's extract some data...
computerScienceField = htmlContent.find('option', text='Informatique')


http://isa.epfl.ch/imoniteur_ISAP/!GEDPUBLICREPORTS.html?ww_x_GPS=-1&ww_i_reportModel=133685247&ww_i_reportModelXsl=133685270&ww_x_UNITE_ACAD=249847&ww_x_PERIODE_ACAD=978181&ww_x_PERIODE_PEDAGO=2230106&ww_x_HIVERETE=2936286

In [21]:
# Getting the table of students
# Let's make the columns
columns = []
table = htmlContent.find('table')
th = table.find('th', text='Civilité')
columns.append(th.text)
# Go through the table until the last column
while th.findNext('').name == 'th':
    th = th.findNext('')
    columns.append(th.text)
    
# This array will contain all the students    
studentsTable = []

DON'T RUN THE NEXT CELL OR IT WILL CRASH ! :x


In [22]:
# Getting the information about the student we're "looping on"
currentStudent = []
tr = th.findNext('tr')
children = tr.children
for child in children:
    currentStudent.append(child.text)
    
# Add the student to the array    
studentsTable.append(currentStudent)

In [23]:
a = tr.findNext('tr')
a


Out[23]:
<tr><td style="white-space:nowrap">Madame</td><td style="white-space:nowrap">Agarwal Megha</td><td style="white-space:nowrap"></td><td style="white-space:nowrap"></td><td style="white-space:nowrap"></td><td style="white-space:nowrap"></td><td style="white-space:nowrap"></td><td style="white-space:nowrap">Présent</td><td style="white-space:nowrap"></td><td style="white-space:nowrap"></td><td>180027</td><td style="white-space:nowrap"></td></tr>

In [ ]:
while tr.findNext('tr') is not None:
    tr = th.findNext('tr')
    children = tr.children
    for child in children:
        currentStudent.append(child.text)
    studentsTable.append(currentStudent)
    
studentsTable

In [ ]:
#tr = th.parent
#td = th.findNext('td')
#td.text
#th.findNext('th')
#th.findNext('th')
#tr = tr.findNext('tr')
#tr

In [ ]:
print(htmlContent.prettify())